220 research outputs found
On the combinatorics of suffix arrays
We prove several combinatorial properties of suffix arrays, including a
characterization of suffix arrays through a bijection with a certain
well-defined class of permutations. Our approach is based on the
characterization of Burrows-Wheeler arrays given in [1], that we apply by
reducing suffix sorting to cyclic shift sorting through the use of an
additional sentinel symbol. We show that the characterization of suffix arrays
for a special case of binary alphabet given in [2] easily follows from our
characterization. Based on our results, we also provide simple proofs for the
enumeration results for suffix arrays, obtained in [3]. Our approach to
characterizing suffix arrays is the first that exploits their relationship with
Burrows-Wheeler permutations
On recognizing words that are squares for the shuffle product
International audienceun résumé
Finding Occurrences of Protein Complexes in Protein-Protein Interaction Graphs
International audienceIn the context of comparative analysis of protein-protein interaction graphs, we use a graph-based formalism to detect the preservation of a given protein complex G in the protein-protein interaction graph H of another species with respect to (w.r.t.) orthologous proteins. Two problems are considered: the Exact-(G; H)-Matching problem and the Max-(G; H)-Matching problems, where G (resp. H) denotes in both problems the maximum number of orthologous proteins in H (resp. G) of a protein in G (resp. H). Following [10], the Exact-(G; H)-Matching problem asks for an injective homomorphism of G to H w.r.t. orthologous proteins. The optimization version is called the Max-(G; H)-Matching problem and is concerned with finding an injective mapping of a graph G to a graph H w.r.t. orthologous proteins that matches as many edges of G as possible. For both problems, we essentially focus on bounded degree graphs and extremal small values of parameters G and H
Comparing RNA structures using a full set of biologically relevant edit operations is intractable
7 pagesArc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The problem of computing an edit distance between two non-crossing arc-annotated sequences was introduced in \cite{Evans99}. The introduced model uses edit operations that involve either single letters or pairs of letters (never considered separately) and is solvable in polynomial-time \cite{ZhangShasha:1989}. To account for other possible RNA structural evolutionary events, new edit operations, allowing to consider either silmutaneously or separately letters of a pair were introduced in \cite{jiangli}; unfortunately at the cost of computational tractability. It has been proved that comparing two RNA secondary structures using a full set of biologically relevant edit operations is {\sf\bf NP}-complete. Nevertheless, in \cite{DBLP:conf/spire/GuignonCH05}, the authors have used a strong combinatorial restriction in order to compare two RNA stem-loops with a full set of biologically relevant edit operations; which have allowed them to design a polynomial-time and space algorithm for comparing general secondary RNA structures. In this paper we will prove theoretically that comparing two RNA structures using a full set of biologically relevant edit operations cannot be done without strong combinatorial restrictions
Obtaining a Triangular Matrix by Independent Row-Column Permutations
International audienceGiven a square (0, 1)-matrix A, we consider the problem of deciding whether there exists a permutation of the rows and a permutation of the columns of A such that after carrying out these permutations , the resulting matrix is triangular. The complexity of the problem was posed as an open question by Wilf [7] in 1997. In 1998, DasGupta et al. [3] seemingly answered the question, proving it is NP-complete. However , we show here that their result is flawed, which leaves the question still open. Therefore, we give a definite answer to this question by proving that the problem is NP-complete. We finally present an exponential-time algorithm for solving the problem
Obtaining a Triangular Matrix by Independent Row-Column Permutations
International audienceGiven a square (0, 1)-matrix A, we consider the problem of deciding whether there exists a permutation of the rows and a permutation of the columns of A such that after carrying out these permutations , the resulting matrix is triangular. The complexity of the problem was posed as an open question by Wilf [7] in 1997. In 1998, DasGupta et al. [3] seemingly answered the question, proving it is NP-complete. However , we show here that their result is flawed, which leaves the question still open. Therefore, we give a definite answer to this question by proving that the problem is NP-complete. We finally present an exponential-time algorithm for solving the problem
Flexible RNA design under structure and sequence constraints using formal languages
The problem of RNA secondary structure design (also called inverse folding)
is the following: given a target secondary structure, one aims to create a
sequence that folds into, or is compatible with, a given structure. In several
practical applications in biology, additional constraints must be taken into
account, such as the presence/absence of regulatory motifs, either at a
specific location or anywhere in the sequence. In this study, we investigate
the design of RNA sequences from their targeted secondary structure, given
these additional sequence constraints. To this purpose, we develop a general
framework based on concepts of language theory, namely context-free grammars
and finite automata. We efficiently combine a comprehensive set of constraints
into a unifying context-free grammar of moderate size. From there, we use
generic generic algorithms to perform a (weighted) random generation, or an
exhaustive enumeration, of candidate sequences. The resulting method, whose
complexity scales linearly with the length of the RNA, was implemented as a
standalone program. The resulting software was embedded into a publicly
available dedicated web server. The applicability demonstrated of the method on
a concrete case study dedicated to Exon Splicing Enhancers, in which our
approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational
Biology and Biomedical Informatics (2013
Some algorithmic results for [2]-sumset covers
International audienceLet X={xi:1â€iâ€n}âN+X={xi:1â€iâ€n}âN+, and hâN+hâN+. The h-iterated sumset of X , denoted hX , is the set {x1+x2+...+xh:x1,x2,...,xhâX}{x1+x2+...+xh:x1,x2,...,xhâX}, and the [h][h]-sumset of X , denoted [h]X[h]X, is the set View the MathML sourceâi=1hiX. A [h][h]-sumset cover of SâN+SâN+ is a set XâN+XâN+ such that Sâ[h]XSâ[h]X. In this paper, we focus on the case h=2h=2, and study the APX-hardproblem of computing a minimum cardinality [2]-sumset cover X of S (i.e. computing a minimum cardinality set XâN+XâN+ such that every element of S is either an element of X , or the sum of two - non-necessarily distinct - elements of X ). We propose two new algorithmic results. First, we give a fixed-parameter tractable (FPT) algorithm that decides the existence of a [2]-sumset cover of size at most k of a given set S . Our algorithm runs in View the MathML sourceO(2(3logkâ1.4)kpoly(k)) time, and thus outperforms the O(5k2(k+3)2k2log(k)) time FPT result presented in Fagnot et al. (2009) [6]. Second, we show that deciding whether a set S has a smaller [2]-sumset cover than itself is NP-hard
What Makes the Arc-Preserving Subsequence Problem Hard?
International audienceGiven two arc-annotated sequences (S, P ) and (T, Q) representing RNA structures, the Arc-Preserving Subsequence (APS) problem asks whether (T, Q) can be obtained from (S, P ) by deleting some of its bases (together with their incident arcs, if any). In previous studies [3, 6], this problem has been naturally divided into subproblems reïŹecting intrinsic complexity of arc structures. We show that APS(Crossing, Plain) is NP-complete, thereby answering an open problem [6]. Furthermore, to get more insight into where actual border of APS hardness is, we reïŹne APS classical subproblems in much the same way as in [11] and give a complete categorization among various restrictions of APS problem complexity
- âŠ